SpamRank -- Fully Automatic Link Spam Detection

نویسندگان

  • András A. Benczúr
  • Károly Csalogány
  • Tamás Sarlós
  • Máté Uher
چکیده

Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We propose a novel method based on the concept of personalized PageRank that detects pages with an undeserved high PageRank value without the need of any kind of white or blacklists or other means of human intervention. We assume that spammed pages have a biased distribution of pages that contribute to the undeserved high PageRank value. We define SpamRank by penalizing pages that originate a suspicious PageRank share and personalizing PageRank on the penalties. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page stratified random sample with bias towards large PageRank values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

R-SpamRank: A Spam Detection Algorithm Based on Link Analysis

Spam web pages intend to achieve higher-than-deserved ranking by various techniques. While human experts could easily identify spam web pages, the manual evaluating process of a large number of pages is still time consuming and cost consuming. To assist manual evaluation, we propose an algorithm to assign spam values to web pages and semi-automatically select potential spam web pages. We first ...

متن کامل

Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection

This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and difficult to solve, mostly due to the large size of web collections that makes many algorithms unfeasible in practice. For spam detection we app...

متن کامل

Link-Based Characterization and Detection of Web Spam

We perform a statistical analysis of a large collection of Web pages, focusing on spam detection. We study several metrics such as degree correlations, number of neighbors, rank propagation through links, TrustRank and others to build several automatic web spam classifiers. This paper presents a study of the performance of each of these classifiers alone, as well as their combined performance. ...

متن کامل

One Way to Detecting of Link Spam

In article is considered the method of link spam detection. The basic place in work is detection of “paid” links as a kind of link spam. Here are analysed the significant characteristics of “paid” links. Based on this information algorithm of link spam detection is described. Finally, in article results of algorithm working is given.

متن کامل

Mining Page Farms and Its Application in Link Spam Detection

Understanding the general relations of Web pages and their environments is important with a few interesting applications such as Web spam detection. In this thesis, we study the novel problem of page farm mining and its application in link spam detection. A page farm is the set of Web pages contributing to (a major portion of) the PageRank score of a target page. We show that extracting page fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005